Multi-lingual Text Leveling
نویسندگان
چکیده
Determining the language proficiency level required to understand a given text is a key requirement in vetting documents for use in second language learning. In this work, we describe our approach for developing an automatic text analytic to estimate the text difficulty level using the Interagency Language Roundtable (ILR) proficiency scale. The approach we take is to use machine translation to translate a non-English document into English and then use an English language trained ILR level detector.We achieve good results in predicting ILR levels with both human and machine translation of Farsi documents. We also report results on text leveling prediction on human translations into English of documents from 54 languages.
منابع مشابه
Identifying Similarity in Text: Multi-Lingual Analysis for Summarization
Identifying Similarity in Text: Multi-Lingual Analysis for Summarization
متن کاملEnglish-Persian Plagiarism Detection based on a Semantic Approach
Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...
متن کاملText categorization on a multi-lingual corpus
This paper presents experiments with a hierarchical text categorizer on a multi-lingual (English, French) corpus. The results obtained are very similar for both languages. The results allow us to apply in the near future cross-language text categorization that can be used to support automatic translation to create multi-lingual topic glossary.
متن کاملMulti-Lingual Text Generation and the Meaning-Text Theory
We describe multi-lingual text generation as an alternative to automatic translation in specified technical sublanguages, illustrating the notion with the implemented RAREAS-2 system for synthesizing marine weather forecasts in English and French. We then review the Meaning-Text Theory (MTT) of Mel'cuk et al. as we have applied it to text generation in the GOSSIP system for producing English re...
متن کاملEnhancing Multi-lingual Information Extraction via Cross-Media Inference and Fusion
We describe a new information fusion approach to integrate facts extracted from cross-media objects (videos and texts) into a coherent common representation including multi-level knowledge (concepts, relations and events). Beyond standard information fusion, we exploited video extraction results and significantly improved text Information Extraction. We further extended our methods to multi-lin...
متن کامل